Zurich - 27 & 28 June 2022

Multiple imputation (recap)

Multiple Imputation

  • Imputation phase
  • Analysis phase
  • Pooling phase
  1. incomplete data
  2. generate multiple copies of the same dataset, but each time differenty imputed values
  3. analyze each imputed dataset
  4. pool results for analyses to final study result

Notes on multiple imputation

  • Takes imputation uncertainty into account
  • A method to improve the main analysis results, (so not to complete or fill in data)
  • Make sure that the imputation model
    • Holds the relevant variables to deal with missings
    • Is compatible with the analysis model

Full Information Maximum Likelihood

How does it work

  • Uses all observed data, so also partly observed rows, to estimate model parameters.
  • Analysis and dealing with missing data, at once
  • Different ways to estimate, for example EM algorithm
  • Only involves variables used in the analysis

FIML in lavaan

  • In R use lavaan package with missing = "fiml".
model <- '
  #variance
  Ozone ~~ Ozone
  Solar.R ~~ Solar.R
  Wind ~~ Wind
  Temp ~~ Temp
  
  #correlation
  Ozone ~~ Solar.R + Wind + Temp
  Solar.R ~~ Wind + Temp
  Wind ~~ Temp
  '
fit <- sem(model, data = airquality, missing = "fiml", meanstructure = TRUE)

FIML descriptives

> 
> Parameter Estimates:
> 
>   Standard errors                             Standard
>   Information                                 Observed
>   Observed information based on                Hessian
> 
> Covariances:
>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
>   Ozone ~~                                                              
>     Solar.R         942.530  266.602    3.535    0.000  942.530    0.324
>     Wind            -64.636   11.033   -5.858    0.000  -64.636   -0.570
>     Temp            209.563   31.267    6.702    0.000  209.563    0.687
>   Solar.R ~~                                                            
>     Wind            -17.335   26.211   -0.661    0.508  -17.335   -0.055
>     Temp            238.073   74.272    3.205    0.001  238.073    0.281
>   Wind ~~                                                               
>     Temp            -15.172    2.946   -5.151    0.000  -15.172   -0.458
> 
> Intercepts:
>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
>     Ozone            41.871    2.782   15.048    0.000   41.871    1.296
>     Solar.R         184.847    7.428   24.884    0.000  184.847    2.055
>     Wind              9.958    0.284   35.076    0.000    9.958    2.836
>     Temp             77.882    0.763  102.112    0.000   77.882    8.255
> 
> Variances:
>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
>     Ozone          1044.019  129.627    8.054    0.000 1044.019    1.000
>     Solar.R        8090.702  950.667    8.511    0.000 8090.702    1.000
>     Wind             12.330    1.410    8.746    0.000   12.330    1.000
>     Temp             89.006   10.176    8.746    0.000   89.006    1.000

Output explained

Output is quite large and gives a lot of information at once.

  • Intercepts: FIML means
    • ~1 in output tables
  • Variance: FIML variances
    • variable x ~~ variable x in output tables
  • Covariances: FIML correlation (because standardized = TRUE)
    • variable x ~~ variable y in output tables

Use fmi = TRUE in the summary() function to get fraction of missing information.

fmi = the relative increase in variance and decrease of precision due to missing data, i.e. impact of missing data on estimates.

Missing data patterns

  • Missing data patterns in the data
>      Ozone Solr.R Wind Temp
> [1,]     1      1    1    1
> [2,]     0      1    1    1
> [3,]     1      0    1    1
> [4,]     0      0    1    1

Missing data proportions

  • A symmetric matrix where each element contains the proportion of observed datapoints for the corresponding pair of observed variables.
>         Ozone Solr.R Wind  Temp 
> Ozone   0.758                   
> Solar.R 0.725 0.954             
> Wind    0.758 0.954  1.000      
> Temp    0.758 0.954  1.000 1.000

Linear regression analysis

> 
> Parameter Estimates:
> 
>   Standard errors                             Standard
>   Information                                 Observed
>   Observed information based on                Hessian
> 
> Regressions:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>   Ozone ~                                                      
>     Solar.R           0.127    0.032    3.915    0.000    0.223
> 
> Intercepts:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>    .Ozone            18.599    6.687    2.781    0.005    0.218
> 
> Variances:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>    .Ozone           964.164  129.421    7.450    0.000    0.240

Compare with MI

>          term estimate  std.error statistic       df     p.value
> 1 (Intercept) 19.53398 7.24259142  2.697098 20.14387 0.013810123
> 2     Solar.R  0.11901 0.03438927  3.460673 22.04619 0.002219167

SEM with auxiliary variables

  • Use auxiliary variables to improve missing data handling
  • Auxiliary variables to covariances in the model

SEM with auxiliary variables

> 
> Parameter Estimates:
> 
>   Standard errors                             Standard
>   Information                                 Observed
>   Observed information based on                Hessian
> 
> Regressions:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>   Ozone ~                                                      
>     Solar.R           0.112    0.030    3.701    0.000    0.158
> 
> Covariances:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>   Solar.R ~~                                                   
>     Wind            -17.069   26.123   -0.653    0.514    0.046
>     Temp            232.954   73.899    3.152    0.002    0.077
>     Month            -7.321   10.557   -0.693    0.488    0.056
>     Day            -119.301   66.532   -1.793    0.073    0.051
>  .Ozone ~~                                                     
>     Wind            -63.332   10.615   -5.966    0.000    0.095
>     Temp            183.490   28.455    6.448    0.000    0.102
>     Month             8.240    3.735    2.206    0.027    0.090
>     Day               6.540   23.317    0.280    0.779    0.134
>   Wind ~~                                                      
>     Temp            -15.172    2.946   -5.151    0.000   -0.000
>     Month            -0.884    0.407   -2.171    0.030   -0.000
>     Day               0.843    2.509    0.336    0.737   -0.000
>   Temp ~~                                                      
>     Month             5.607    1.168    4.799    0.000   -0.000
>     Day             -10.886    6.796   -1.602    0.109    0.000
>   Month ~~                                                     
>     Day              -0.099    1.009   -0.098    0.922   -0.000
> 
> Intercepts:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>    .Ozone            21.819    6.194    3.523    0.000    0.152
>     Solar.R         185.534    7.410   25.038    0.000    0.042
>     Wind              9.958    0.284   35.076    0.000    0.000
>     Temp             77.882    0.763  102.112    0.000    0.000
>     Month             6.993    0.114   61.269    0.000    0.000
>     Day              15.804    0.714   22.125    0.000    0.000
> 
> Variances:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>    .Ozone           943.445  119.142    7.919    0.000    0.180
>     Solar.R        8050.793  941.698    8.549    0.000    0.045
>     Wind             12.330    1.410    8.746    0.000   -0.000
>     Temp             89.006   10.176    8.746    0.000   -0.000
>     Month             1.993    0.228    8.746    0.000    0.000
>     Day              78.066    8.925    8.746    0.000    0.000

SEM with auxiliary variables

  • Easily add auxiliary variables with semTools package

SEM with auxiliary variables

> 
> Parameter Estimates:
> 
>   Standard errors                             Standard
>   Information                                 Observed
>   Observed information based on                Hessian
> 
> Regressions:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>   Ozone ~                                                      
>     Solar.R           0.112    0.030    3.701    0.000    0.158
> 
> Covariances:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>   Wind ~~                                                      
>     Temp            -15.172    2.946   -5.151    0.000    0.000
>     Month            -0.884    0.407   -2.171    0.030   -0.000
>     Day               0.843    2.509    0.336    0.737    0.000
>   Temp ~~                                                      
>     Month             5.607    1.168    4.799    0.000    0.000
>     Day             -10.886    6.796   -1.602    0.109    0.000
>   Month ~~                                                     
>     Day              -0.099    1.009   -0.098    0.922   -0.000
>   Wind ~~                                                      
>    .Ozone           -63.332   10.615   -5.966    0.000    0.095
>   Temp ~~                                                      
>    .Ozone           183.490   28.455    6.448    0.000    0.102
>   Month ~~                                                     
>    .Ozone             8.240    3.735    2.206    0.027    0.090
>   Day ~~                                                       
>    .Ozone             6.540   23.317    0.280    0.779    0.134
>   Wind ~~                                                      
>     Solar.R         -17.069   26.123   -0.653    0.514    0.046
>   Temp ~~                                                      
>     Solar.R         232.954   73.899    3.152    0.002    0.077
>   Month ~~                                                     
>     Solar.R          -7.321   10.557   -0.693    0.488    0.056
>   Day ~~                                                       
>     Solar.R        -119.301   66.532   -1.793    0.073    0.051
> 
> Intercepts:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>    .Ozone            21.819    6.194    3.523    0.000    0.152
>     Solar.R         185.534    7.410   25.038    0.000    0.042
>     Wind              9.958    0.284   35.076    0.000    0.000
>     Temp             77.882    0.763  102.112    0.000    0.000
>     Month             6.993    0.114   61.269    0.000    0.000
>     Day              15.804    0.714   22.125    0.000    0.000
> 
> Variances:
>                    Estimate  Std.Err  z-value  P(>|z|)      FMI
>    .Ozone           943.445  119.142    7.919    0.000    0.180
>     Solar.R        8050.793  941.698    8.549    0.000    0.045
>     Wind             12.330    1.410    8.746    0.000   -0.000
>     Temp             89.006   10.176    8.746    0.000    0.000
>     Month             1.993    0.228    8.746    0.000   -0.000
>     Day              78.066    8.925    8.746    0.000    0.000

Compare with MI

>          term   estimate  std.error statistic        df      p.value
> 1 (Intercept) 20.4707060 6.42380402  3.186695  57.45493 0.0023279422
> 2     Solar.R  0.1135789 0.02925834  3.881931 109.81512 0.0001772778

Mssing data in questionnaires

Multi-item questionnaires

  • Constructs measured indirectly, through items.
  • Item can be continuous, dichotomous or measured on Likert scale.
  • Summary score of items is the construct

Summarizing item scores

  • CTT: sum score or mean score
  • Latent variable obtained via Item Response model (e.g. Rasch model or 2 parameter logistic model)
  • Latent variable measured in a structural equation model

Missing data in multi-item questionnaires

  • Missing item level scores
  • Missing full questionnaire
>   item1 item2 item3 item4 item5 Total.score
> 1     1     0     0     0     1           2
> 2    NA     1     0    NA     1          NA
> 3    NA    NA    NA    NA    NA          NA

Both lead to a missing total score

Advice in user manuals

  • User manuals often have a statement about dealing with missing items scores
  • This method is not always evidence-based
  • Examples:
    • SF-36: “Items that are left blank (missing data) are not taken into account when calculating the scale scores. Hence, scale scores represent the average for all items in the scale that the respondent answered.” (Ware & Sherbourne, 1992)
    • SDQ: “The SDQ comprises of 5 items for 5 subscales. For each of the 5 scales the score can range from 0 to 10 if all items were completed. These scores can be scaled up pro-rata if at least 3 items were completed, e.g. a score of 4 based on 3 completed items can be scaled up to a score of 7 (6.67 rounded up) for 5 items.” (Goodman, 2010)

Person mean imputation

  • Single imputation of mean over the observed items (within person)
  • Same as average over the available items
  • Best ad hoc single imputation method available
  • Disadvantages of single imputation: no imputation uncertainty
  • Works best when correlation between items is higher

Example of person mean imputation

>   Q1i1 Q1i2 Q1i3 Q1i4 Q1i5 TSQ1
> 1   NA   NA   NA    2    1   NA
> 2    2    5    1    1    1   10
> 3    1    1    1    1    1    5
> 4    1    1    2    4    1    9
> 5    5    5    1    5    5   21
x %>% 
  #compute average over available items (AAI)
  mutate(AAI = rowMeans(select(.,Q1i1, Q1i2, Q1i3, Q1i4, Q1i5), na.rm = T)) %>%
  #then apply rule to all items that if the score is missing, to replace it with AAI
   mutate_at(.vars = vars(Q1i1:Q1i5),
             .funs = list(~ ifelse(is.na(.), AAI, .))) %>%
  mutate(TSQ1 = rowSums(select(.,Q1i1, Q1i2, Q1i3, Q1i4, Q1i5)))
>   Q1i1 Q1i2 Q1i3 Q1i4 Q1i5 TSQ1 AAI
> 1  1.5  1.5  1.5    2    1  7.5 1.5
> 2  2.0  5.0  1.0    1    1 10.0 2.0
> 3  1.0  1.0  1.0    1    1  5.0 1.0
> 4  1.0  1.0  2.0    4    1  9.0 1.8
> 5  5.0  5.0  1.0    5    5 21.0 4.2

Note on person mean imputation

  • Can get unstable when:
    • Many items are missing.
    • Correlations between items is low.
  • Single imputation method, no missing data uncertainty.

Multiple imputation in multi-item questionnaires

  • Imputing item scores versus imputing total scores
  • Item scores can hold valuable information
  • Total scores are often used in analyses

Advised strategy for imputation

  • When there are item scores observed, use item level imputation
  • When only few item scores or none, use total score imputation
  • Combine both strategies

Challenges in multi-item questionnaire missings

  • Imputation model can grow large when all items are used
  • When the total score is used in analyses, the total score should be used as predictor for other variables

Solutions

Imputation model can grow large when all items are used

  • Isolate item imputation per questionnaire or subscale via prediction matrix
  • Especially relevant in longitudinal data

When the total score is used in analyses, the total score should be used as predictor for other variables

  • Update total score after each iteration by using passive imputation.

Illustration item and total scores

  • Data set with two questionnaires each 5 items, and 1 covariate.
  • Isolate item imputation per questionnaire or subscale.
  • Use predictor matrix: rows indicate imputed variable, columns are predictors.

For Q1:

>      Q1i1 Q1i2 Q1i3 Q1i4 Q1i5 Q2i1 Q2i2 Q2i3 Q2i4 Q2i5 TSQ1 TSQ2 cov1
> Q1i1    0    1    1    1    1    0    0    0    0    0    1    1    1
> Q1i2    1    0    1    1    1    0    0    0    0    0    1    1    1
> Q1i3    1    1    0    1    1    0    0    0    0    0    1    1    1
> Q1i4    1    1    1    0    1    0    0    0    0    0    1    1    1
> Q1i5    1    1    1    1    0    0    0    0    0    0    1    1    1

Illustration item and total scores

  • Isolate item imputation per questionnaire or subscale

For Q2:

>      Q1i1 Q1i2 Q1i3 Q1i4 Q1i5 Q2i1 Q2i2 Q2i3 Q2i4 Q2i5 TSQ1 TSQ2 cov1
> Q2i1    0    0    0    0    0    0    1    1    1    1    1    1    1
> Q2i2    0    0    0    0    0    1    0    1    1    1    1    1    1
> Q2i3    0    0    0    0    0    1    1    0    1    1    1    1    1
> Q2i4    0    0    0    0    0    1    1    1    0    1    1    1    1
> Q2i5    0    0    0    0    0    1    1    1    1    0    1    1    1

Illustration item and total scores

  • Total score cannot be used as predictor for its own items.
>      Q1i1 Q1i2 Q1i3 Q1i4 Q1i5 Q2i1 Q2i2 Q2i3 Q2i4 Q2i5 TSQ1 TSQ2 cov1
> Q1i1    0    1    1    1    1    0    0    0    0    0    0    1    1
> Q1i2    1    0    1    1    1    0    0    0    0    0    0    1    1
> Q1i3    1    1    0    1    1    0    0    0    0    0    0    1    1
> Q1i4    1    1    1    0    1    0    0    0    0    0    0    1    1
> Q1i5    1    1    1    1    0    0    0    0    0    0    0    1    1
> Q2i1    0    0    0    0    0    0    1    1    1    1    1    0    1
> Q2i2    0    0    0    0    0    1    0    1    1    1    1    0    1
> Q2i3    0    0    0    0    0    1    1    0    1    1    1    0    1
> Q2i4    0    0    0    0    0    1    1    1    0    1    1    0    1
> Q2i5    0    0    0    0    0    1    1    1    1    0    1    0    1

Illustration item and total scores

  • Item scores cannot be used as predictor together with its own total scores.
>      Q1i1 Q1i2 Q1i3 Q1i4 Q1i5 Q2i1 Q2i2 Q2i3 Q2i4 Q2i5 TSQ1 TSQ2 cov1
> TSQ1    0    0    0    0    0    0    0    0    0    0    0    1    1
> TSQ2    0    0    0    0    0    0    0    0    0    0    1    0    1
> cov1    0    0    0    0    0    0    0    0    0    0    1    1    0

Illustration item and total scores

  • Total score should be used as predictor for other variables when used in the analysis model.
>      TSQ1 TSQ2 cov1
> TSQ1    0    1    1
> TSQ2    1    0    1
> cov1    1    1    0

Illustration item and total scores

  • Full predictor matrix
>      Q1i1 Q1i2 Q1i3 Q1i4 Q1i5 Q2i1 Q2i2 Q2i3 Q2i4 Q2i5 TSQ1 TSQ2 cov1
> Q1i1    0    1    1    1    1    0    0    0    0    0    0    1    1
> Q1i2    1    0    1    1    1    0    0    0    0    0    0    1    1
> Q1i3    1    1    0    1    1    0    0    0    0    0    0    1    1
> Q1i4    1    1    1    0    1    0    0    0    0    0    0    1    1
> Q1i5    1    1    1    1    0    0    0    0    0    0    0    1    1
> Q2i1    0    0    0    0    0    0    1    1    1    1    1    0    1
> Q2i2    0    0    0    0    0    1    0    1    1    1    1    0    1
> Q2i3    0    0    0    0    0    1    1    0    1    1    1    0    1
> Q2i4    0    0    0    0    0    1    1    1    0    1    1    0    1
> Q2i5    0    0    0    0    0    1    1    1    1    0    1    0    1
> TSQ1    0    0    0    0    0    0    0    0    0    0    0    1    1
> TSQ2    0    0    0    0    0    0    0    0    0    0    1    0    1
> cov1    0    0    0    0    0    0    0    0    0    0    1    1    0

Passive imputation total score

  • Total score is used as predictor, but not directly imputed.
  • Item scores are imputed.
  • Total scores re-calculated from the imputed item scores: Passive imputation

Passive imputation process

During each iteration for Q1:

1. Impute item scores using items from its own questionnaire, total score(s) from other questionnaires and covariate(s).

  • \(Q1i1_i = Q1i2_i + Q1i3_i + Q1i4_i + Q1i5_i + TSQ2_i + cov1_i\)
  • \(Q1i2_i = Q1i1_i + Q1i3_i + Q1i4_i + Q1i5_i + TSQ2_i + cov1_i\)
  • \(etc.\)

2. Total score is re-calculated using the imputed item scores.

  • \(TSQ1_i = Q1i1_i + Q1i2_i + Q1i3_i + Q1i4_i + Q1i5_i\)

3. Updated total score is used as predictor for covariate(s) and items of other questionnaires in next iteration.

  • \(Q2i1_i = Q2i2_i + Q2i3_i + Q2i4_i + Q2i5_i + TSQ1_i + cov1_i\)

Note the \(_i\) indicates impute value from the previous iteration.

Passive imputation code

  • Change the imputation method for the total scores.
>  TSQ1  TSQ2 
> "pmm" "pmm"
>                                   TSQ1                                   TSQ2 
> "~I(Q1i1 + Q1i2 + Q1i3 + Q1i4 + Q1i5)" "~I(Q2i1 + Q2i2 + Q2i3 + Q2i4 + Q2i5)"

Missing data in IRT models

  • Some of the estimation methods for IRT methods deal with missing values by MAR assumption
  • For example maximum likelihood estimation

Missing data in SEM models

  • Full Information Maximum Likelihood estimation
  • In R the lavaan package - simulates Mplus

Missing item scores

  • Item level information may need a different strategy in order to use all available information
  • When there are missing values at item level only, use a strategy that involves the item scores.
  • When mostly the full questionnaire is missing, the missing data can be dealt with at the total score level.

Longitudinal missing data

Repeated measurements data

  • In longitudinal studies people are monitored for a longer time period.
  • For example RCT with follow-up.
    • Baseline
    • Post-treatment
    • Follow-up
  • Intensive longitudinal data: sensor measurements, daily measures.

Challenges

  • Multiple variables at each time-point
  • Repeated measurements are correlated within persons: multilevel structure
  • Measurement time-points some times do not line up

Data structure

  • Wide data format
    • One row per person, repeated measurements in the columns
  • Long data format
    • Multiple rows per person, one column as time-point indicator

Wide data format

>    id group          T0          T1        T2
> 1   1     0 -0.89273971  0.62595050 1.6787242
> 2   2     1  1.09466683  1.66393217 2.6314272
> 3   3     0  0.53677979  1.99526687 3.0587021
> 4   4     1 -0.08057365  1.35645288 1.5600102
> 5   5     0 -0.24587013 -0.06647697 0.5832191
> 6   6     1  0.60444222  1.45718160 2.7469386
> 7   7     0  0.00417498  0.21136080 1.4365193
> 8   8     1  1.94695848  2.76345288 3.6306766
> 9   9     0 -2.22879834 -0.75850320 0.8149130
> 10 10     1 -0.23124489  1.28434128 2.1040148
> 11 11     0  1.53801265  2.04635754 2.6932853
> 12 12     1  0.65401059  1.46172168 2.0400761
> 13 13     0 -0.29052400 -0.94582269 1.3305414
> 14 14     1  0.30578755  1.01441996 2.6523174
> 15 15     0 -0.70110697  0.67496886 1.4098061

Long data format

>    id group time     outcome
> 1   1     0   T0 -0.89273971
> 2   1     0   T1  0.62595050
> 3   1     0   T2  1.67872419
> 4   2     1   T0  1.09466683
> 5   2     1   T1  1.66393217
> 6   2     1   T2  2.63142721
> 7   3     0   T0  0.53677979
> 8   3     0   T1  1.99526687
> 9   3     0   T2  3.05870208
> 10  4     1   T0 -0.08057365
> 11  4     1   T1  1.35645288
> 12  4     1   T2  1.56001019
> 13  5     0   T0 -0.24587013
> 14  5     0   T1 -0.06647697
> 15  5     0   T2  0.58321912

Missing data imputation

  • Wide imputation
  • Long imputation: multilevel imputation

Wide data imputation

  • Similar to multiple imputation on cross-sectional data
  • Number of variables in imputation model may be a challenge
  • Imputation model ~ analysis model

Example wide imputation

Data set for two groups with measurements at three time-points for:

  • outcome
  • covariate 1
  • covariate 2
  • covariate 3

In total: 13 variables in wide imputation. Can grow large when more variables are measured at each time-point.

Practical solution for wide imputation

  • Leave out covariates from other time-points as predictors in imputation model
  • Adapt the predictormatrix

Example predictormatrix for cov1

>         outcome_T0 outcome_T1 outcome_T2 cov1_T0 cov1_T1 cov1_T2 cov2_T0
> cov1_T0          1          1          1       0       1       1       1
> cov1_T1          1          1          1       1       0       1       0
> cov1_T2          1          1          1       1       1       0       0
>         cov2_T1 cov2_T2 cov3_T0 cov3_T1 cov3_T2
> cov1_T0       0       0       1       0       0
> cov1_T1       1       0       0       1       0
> cov1_T2       0       1       0       0       1

Imputation of long data

  • Multilevel imputation
  • Account for the clustering of measurements within subjects

Methods for multilevel imputation

  • miceadds package contains many additional methods
  • broom.mixed to enable the pool function for the mice output.

Long imputation predictormatrix

  • Use -2 for cluster variable id in predictormatrix (random intercept)
>         id group time outcome cov1 cov2 cov3
> id       0     1    1       1    1    1    1
> group   -2     0    1       1    1    1    1
> time    -2     1    0       1    1    1    1
> outcome -2     1    1       0    1    1    1
> cov1    -2     1    1       1    0    1    1
> cov2    -2     1    1       1    1    0    1
> cov3    -2     1    1       1    1    1    0

Long imputation method

  • Change method to 2l.pmm
>       id    group     time  outcome     cov1     cov2     cov3 
>       ""       ""       "" "2l.pmm" "2l.pmm"       ""       ""

FIML for missing outcomes

  • Missings in dependent variable solved in Maximum likelihood estimation
    • \(Outcome = \beta_0 + \beta_1*cov2 + b_{0j} + \epsilon_{ij}\)
    • \(b_{0j}\) is random intercept at subject level
  • Estimation method optimizes model parameter using all observed data of dependent variable

Missing covariate data are handled by listwise deletion

Example no imputation

  • Only cov1 has missing data
  • Outcome variable has missing observations (drop-out)
>     id group time cov2 cov3 cov1 outcome   
> 111  1     1    1    1    1    1       1  0
> 20   1     1    1    1    1    1       0  1
> 18   1     1    1    1    1    0       1  1
> 1    1     1    1    1    1    0       0  2
>      0     0    0    0    0   19      21 40
> # A tibble: 4 x 8
>   effect   group    term         estimate std.error statistic conf.low conf.high
>   <chr>    <chr>    <chr>           <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
> 1 fixed    <NA>     (Intercept)    1.06       0.130     8.10     0.801     1.31 
> 2 fixed    <NA>     cov2           0.0190     0.115     0.166   -0.206     0.244
> 3 ran_pars id       sd__(Interc~   0.589     NA        NA       NA        NA    
> 4 ran_pars Residual sd__Observa~   1.11      NA        NA       NA        NA

Example with imputation

  • Impute the outcome (and cov1) as comparison.
>          term   estimate std.error statistic        df      p.value
> 1 (Intercept) 1.06633608 0.1325362 8.0456241 111.34485 1.018519e-12
> 2        cov2 0.01953625 0.1163277 0.1679415  86.62942 8.670208e-01